Collective Entity Disambiguation with Structured Gradient Tree Boosting

نویسندگان

  • Yi Yang
  • Ozan Irsoy
  • Kazi Shefaet Rahman
چکیده

We present a gradient-tree-boosting-based structured learning model for jointly disambiguating named entities in a document. Gradient tree boosting is a widely used machine learning algorithm that underlies many topperforming natural language processing systems. Surprisingly, most works limit the use of gradient tree boosting as a tool for regular classification or regression problems, despite the structured nature of language. To the best of our knowledge, our work is the first one that employs the structured gradient tree boosting (SGTB) algorithm for collective entity disambiguation. By defining global features over previous disambiguation decisions and jointly modeling them with local features, our system is able to produce globally optimized entity assignments for mentions in a document. Exact inference is prohibitively expensive for our globally normalized model. To solve this problem, we propose Bidirectional Beam Search with Gold path (BiBSG), an approximate inference algorithm that is a variant of the standard beam search algorithm. BiBSG makes use of global information from both past and future to perform better local search. Experiments on standard benchmark datasets show that SGTB significantly improves upon published results. Specifically, SGTB outperforms the previous state-of-the-art neural system by near 1% absolute accuracy on the popular AIDA-CoNLL dataset.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Combining Gradient Boosting Machines with Collective Inference to Predict Continuous Values

Gradient boosting of regression trees is a competitive procedure for learning predictive models of continuous data that fits the data with an additive non-parametric model. The classic version of gradient boosting assumes that the data is independent and identically distributed. However, relational data with interdependent, linked instances is now common and the dependencies in such data can be...

متن کامل

Pair-Linking for Collective Entity Disambiguation: Two Could Be Better Than All

Collective entity disambiguation, or collective entity linking aims to jointly resolve multiple mentions by linking them to their associated entities in a knowledge base. Previous works largely based on the underlying assumption that entities within the same document are highly related. However, the extend to which these mentioned entities are actually connected in reality is rarely studied and...

متن کامل

Gradient Boosting for Conditional Random Fields

Gradient Boosting for Conditional Random Fields Report Title In this paper, we present a gradient boosting algorithm for tree-shaped conditional random fields (CRF). Conditional random fields are an important class of models for accurate structured prediction, but effective design of the feature functions is a major challenge when applying CRF models to real world data. Gradient boosting, which...

متن کامل

Entity Disambiguation using Freebase and Wikipedia

This thesis addresses the problem of entity disambiguation, which involves identifying important phrases in a given text and linking them to the appropriate entities they refer to. For this work, information extracted from both Freebase and Wikipedia served as the knowledge base. A fully functional entity disambiguation tool is made available online and the challenges involved in each stages of...

متن کامل

Tree-Structured Boosting: Connections Between Gradient Boosted Stumps and Full Decision Trees

José Marcio Luna 1 Eric Eaton 2 Lyle H. Ungar 2 Eric Diffenderfer 1 Shane T. Jensen 3 Efstathios D. Gennatas 4 Mateo Wirth 3 Charles B. Simone II 5 Timothy D. Solberg 4 Gilmer Valdes 4 1 Dept. of Radiation Oncology, University of Pennsylvania {Jose.Luna,Eric.Diffenderfer}@uphs.upenn.edu 2 Dept. of Computer and Information Science, University of Pennsylvania {eeaton,ungar}@cis.upenn.edu 3 Dept. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1802.10229  شماره 

صفحات  -

تاریخ انتشار 2018